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^— ^ ■ Abstract. For several reasons a database may not satisfy a given set 

Cn ' of integrity constraints (ICs), but most likely most of the information in 

$_( ' it is still consistent with those ICs; and could be retrieved when queries 

^1^, are answered. Consistent answers to queries wrt a set of ICs have been 

characterized as answers that can be obtained from every possible mini- 
mally repaired consistent version of the original database. In this paper 
0^ ' we consider databases that contain null values and are also repaired, if 

necessary, using null values. For this purpose, we propose first a pre- 
cise semantics for IC satisfaction in a database with null values that is 
^^ compatible with the way null values are treated in commercial database 

PQ ' management systems. Next, a precise notion of repair is introduced that 

privileges the introduction of null values when repairing foreign key con- 
straints, in such a way that these new values do not create an infinite 
^ ■ cycle of new inconsistencies. Finally, we analyze how to specify this kind 

of repairs of a database that contains null values using disjunctive logic 
programs with stable model semantics. 
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1^ ■ 1 Introduction 

^D ' In databases, integrity constraints (ICs) capture the semantics of the application 

\l , domain, and help maintain the correspondence between this domain and the 

^^ • database when updates are performed. However, there are several reasons for a 

database to be or become inconsistent wrt a given set of ICs |H] ; and sometimes 

it could be difficult, impossible or undesirable to repair the database in order to 

t/2 ■ restore consistency jH]. This process might be too expensive; useful data might 

be lost; it may not be clear how to restore the consistency, and sometimes even 
impossible, e.g. in virtual data integration, where the access to the autonomous 
data sources may be restricted 9 . 

In those situations, possibly most of the data is still consistent and can be 
retrieved when queries are posed to the database. In 0, consistent data is char- 
acterized as the data that is invariant under certain minimal forms of restoration 
of consistency, i.e. as the data that is present in all minimally repaired and con- 
sistent versions of the original instance, the so-called repairs. In particular, an 
answer to a query is defined as consistent when it can be obtained as a standard 
answer to the query from every possible repair. 

More precisely, a repair of a database instance D, as introduced in '5', is a new 
instance of the same schema as D that satisfies the given ICs, and makes minimal 
under set inclusion the symmetric set difference with the original instance, taken 
both instances as sets of ground database atoms. 

In 121 I13L 1141 [T7| algorithms and implementations for consistent query an- 
swering (CQA) have been presented, i.e. for retrieving consistent answers from 
inconsistent databases. All of them work only with the original, inconsistent 
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database, without restoring its consistency. That is, inconsistencies are solved at 
query time. This is in correspondence with the idea that the above mentioned re- 
pairs provide an auxihary concept for defining the right semantics for consistent 
query answers. However, those algorithms apply to restricted classes of queries 
and constraints, basically those for which the intrinsic complexity of CQA is still 
manageable [T5j . 

In 131 EOl IB in] ^ different approach is taken: database repairs are specified 
as the stable models of disjunctive logic programs, and in consequence consis- 
tent query answering amounts to doing cautious or certain reasoning from logic 
programs under the stable model semantics. In this way, it is possible to handle 
any set of universal ICs and any first-order query, and even beyond that, e.g. 
queries expressed in extensions of Datalog. It is important to realize that the 
data complexity of query evaluation in disjunctive logic programs with stable 
model semantics ^Hj matches the intrinsic data complexity of CQA JS], namely 
both of them are 77|^-complete. 

All the previous work cited before did not consider the possible presence 
of null values in the database, and even less their peculiar semantics. Using 
null values to repair ICs was only slightly considered in El Ej • This strategy 
to deal with referential ICs seemed to be the right way to proceed given the 
results presented in ^J that show that repairing cyclic sets of referential ICs by 
introducing arbitrary values from the underlying database domain leads to the 
undecidability of CQA. 

In jlOj the methodology presented in jnHEj, based on specifying repairs using 
logic programs withe extra annotation constants, was systematically extended 
in order to handle both; (a) databases containing null values, and (b) referential 
integrity constraints (RICs) whose satisfaction is restored via introduction of 
null values. According to the notion of IC satisfaction implicit in ^U], those 
introduced null values do not generate any new inconsistencies. 

Here, we extend the approach and results in 10 in several ways. First, we 
give a precise semantics for integrity constraint satisfaction in the presence of 
null values that is both sensitive to the relevance of the occurrence of a null 
value in a relation, and also compatible with the way null values are usually 
treated in commercial database management systems (the one given in jlO] was 
much more restrictive). The introduced null values do not generate infinite repair 
cycles through the same or other ICs, which requires a semantics for integrity 
constraints satisfaction under null values that sanctions that tuples with null 
values in attributes relevant to check the IC do not generate any new incon- 
sistencies. A new notion of repair is given accordingly. With the new repair 
semantics CQA becomes decidable for a quite general class of ICs that includes 
universal constraints, referential ICs, NOT NULL-constTsdnts, and foreign key 
constraints, even the cyclic cases. 

The logic programs that specify the repairs are modified wrt those given 
introduced in |1(J) . in such a way that the expected one-to-one correspondence 
between the stable models and repairs is recovered for acyclic sets of RICs. 
Finally, we study classes of ICs for which the specification can be optimized and 
a lower complexity for CQA can be obtained. 

2 Preliminaries 

We concentrate on relational databases, and we assume we have a fixed relational 
schema S = {U,TZ,B), where U is the possibly infinite database domain such 



that null G Z-/, 7?, is a fixed set of database predicates, each, of them with a 
finite, ordered set of attributes, and S is a fixed set of built-in predicates, like 
comparison predicates. R[i] denotes the attribute in position i of predicate R G 
TZ. The schema determines a language C{E) of first-order predicate logic. A 
database instance D compatible with E can be seen as a finite collection of 
ground atoms of the form i?(ci, ..., Cn),^ where i? is a predicate in TZ and ci, ..., c„ 
are constants in U. Built-in predicates have a fixed extension in every database 
instance, not subject to changes. We need to define ICs because their syntax is 
fundamental for what follows. 

An integrity constraint is a sentence ip € C{S) of the form: 

ra n 

V5(/\P,(xO -^ 3z(\/Q,fe,%)V^)), (1) 

where Pi,Qj eTZ, S = lJIli^»; ^^UJ^i^jj Vj ^ ^j S n 2 = 0, Zi n % = 
for i ^ j, and m > 1. Formula iy9 is a disjunction of built-in atoms from B, 
whose variables appear in the antecedent of the implication. We will assume 
that there is a propositional atom false g B that is always false in a database. 
Domain constants other than null may appear instead of some of the variables 
in a constraint of the form (Q. When writing ICs, we will usually leave the prefix 
of universal quantifiers implicit. A wide class of ICs can be accommodated in 
this general syntactic class by appropriate renaming of variables if necessary. 

A universal integrity constraint (UIC) has the form (^, but with z = 0, i.e. 
without existentially quantified variables: 

m n 

Vx(/\P.(i.) -^ \/Q,{y,)y^). (2) 

A referential integrity constraint (RIC) is of the form J^lj but with m = n = 1 
and If — ^, i.e. of the form^: (here x' <Z x and P,QeTZ) 

\fx{P{x)-^3yQ{x',y)). (3) 

Class (EQl includes most ICs commonly found in database practice, e.g. a denial 
constraint can be expressed as ^x{/\^^ Pi{xi) — > false). Functional dependen- 
cies can be expressed by several implications of the form (^, each of them with 
a single equality in the consequent. Partial inclusion dependencies are RICs, and 
full inclusion dependencies are universal constraints. We can also specify (single 
row) check constraints that allow to express conditions on each row in a table, 
so they can be formulated with one predicate in the antecedent of (Q and only 
a formula (f in the consequent. For example, \/xy{P{x,y) ^ y > 0) is a check 
constraint. 

In the following we will assume that we have a fixed finite set IC of ICs of 
the form (^. Notice that sets of constraints of this form are always a consistent 
in the classical sense, because empty database always satisfy them. 

Example 1. For TZ = {P, R, S} and B — {>, =, false}, the following are ICs: (a) 

"^xyzw {P{x,y)AR{y,z,w) — > S{x)y{z^2\/w<y)) (universal). {h)\/xy{P(x, 
y) -^ 3z R{x , y , zj) (referential). {c)'^x{S{x) -^ 3yz{R{x,y)VR{x,y,z))).0 

^ Also called database tuples. Finite sequences of constants in W are simply called 

tuples. 
^ To simplify the presentation, we are assuming the existential variables appear in the 

last attributes of Q, but they may appear anywhere else in Q. 



Notice that defining (p in JQ) as a disjunction of built-in atoms is not an im- 
portant restriction, because an IC that has (/? as a more complex formula can 
be transformed into a set of constraints of the form ^. For example, the for- 
mula Wxy {P{x,y) -^{x>y\/{x~3Ay = 8))) can be transformed into: 
\/xy (P(x, y) -^ {x > yW X — 3)) and \fxy {P{x, y) ^ {x > yV y ~ 8)). 

The dependency graph Q{IC) Xh ^^r a set of ICs IC of the form (^ is defined 
as follows: Each database predicate P inTZ appearing in IC is a vertex, and there 
is a directed edge {Pi, Pj) from Pi to Pj iff there exists a constraint ic G IC such 
that Pi appears in the antecedent of ic and Pj appears in the consequent of ic. 

Example 2. For the set IC containing the UICs ici : S{x) -^ Q{x) and ic2 : 
Q{x) — > R{x), and the RIC ic^ : Q{x) -^ 3yT{x,y), the following is the depen- 
dency graph Q{IC): 




the edges are labelled just for reference. Edges 1 and 2 correspond to the con- 
straints ici and ic2, resp., and edge 3 to ic^. □ 

A connected component in a graph is a maximal subgraph such that for every 
pair {A, B) of its vertices, there is a path from A to i? or from B to A. For a 
graph Q, C{Q) := {c | c is a connected component in Q}; and V{G) is the set of 
vertices of Q. 

Definition 1. Given a set IC of UICs and RICs, IC(j denotes the set of UICs 

in IC . The contracted dependency graph, Q^' (IC), of IC is obtained from Q{IC) 
by replacing, for every c G C{Q{ICjj)),^ the vertices in V(c) by a single vertex 
and deleting all the edges associated to the elements of ICu- Finally, IC is said 
to be RIC-acyclic if Q'~^ {IC) has no cycles. □ 

Example 3. (example |5|cont.) The contracted dependency graph G^{IC) is ob- 
tained by replacing in G{IC) the edges 1 and 2 and their end vertices by a vertex 
labelled with {Q,i?,S'}. 




Since there are no loops in Q'-^ {IC), IC is RIC-acylic. If we add a new UIC: 
T{x, y) -^ R{y) to IC, all the vertices belong to the same connected component. 
Q{IC) and Q^ {IC) are, respectively: 





Since there is a self-loop in Q'-' {IC), the new IC is not RIC-acylic. □ 

As expected, a set of UICs is always RIC-acyclic. 



^ Notice that for every c G C{g{ICu)), it holds c G C{g{IC)). 



3 IC Satisfaction in Databases with Null Values 

We deal with incomplete databases in the classic sense that some information 
is represented using null values [2] (cf. also JHI)- More recently, the notion of 
incomplete database has been used in the context of virtual data integration 
1211 m , referring to data sources that contain a subset of the data of its kind in 
the global system; and in inconsistent databases ITT! fTSJ , referring to the fact 
that inconsistencies may have occurred due to missing information and then, 
repairs are obtained through insertion of new tuples. 

There is no agreement in the literature on the semantics of null values in 
relational databases. There are several different proposals in the research litera- 
ture |29lBl l^l28j . in the SQL standard |^|22], but also implicit semantics in 
the different ways null values are handled in commercial database management 
systems (DBMSs). 

Not even within the SQL standard there is a homogenous and global se- 
mantics of integrity constraint satisfaction in databases with null values; rather, 
different definitions of satisfaction are given for each type of constraint. Actu- 
ally, in the case of foreign key constraints, three different semantics are sug- 
gested (simple-, partial- and full-match). Commercial DBMSs implement only 
the simple-match semantics for foreign key constraints. 

One of the reasons why it is difficult to agree on a semantics is that a null 
value can be interpreted as an unknown, inapplicable or even withheld value. 
Different null constants can be used for each of these different interpretations 
|27|. Also the use of more than one null value (of the same kind), i.e. labelled 
nulls, has been suggested PU], but in this case every new null value uses a new 
fresh constant; for which the unique nam,es assum,ption does not apply. The 
latter alternative allows to keep a relationship between null values in different 
attributes or relations. However commercial DBMSs consider only one null value, 
represented by a single constant, that can be given any of the interpretations 
mentioned above. 

In ^ni a semantics for null values was adopted, according to which a tuple 
with a null value in any of its attributes would not be the cause for any inconsis- 
tencies. In other words, it would not be necessary to check tuples with null values 
wrt possible violations of ICs (except for NOT NULL- constraints, of course). 
This assumption is consistent in some cases with the practice of DBMSs, e.g. 
in IBM DB2 UDB. Here we will propose a semantics that is less liberal in rela- 
tion to the participation of null values in inconsistencies; a sort of compromise 
solution considering the different alternatives available. 

Exam,ple 4- For IC containing only ipi : P{x, y, z) -^ R{y, z), the database D = 
{P{a,b,null)} would be: (a) Consistent wrt the semantics in fTUj because there 
is a null value in the tuple (b) Consistent wrt the simple-match semantics of 
SQL:2003 [22], because there is a null value in one of the attributes in the set 
{P[2], P[3], i?[l], i?[2]} of attributes that are relevant to check the constraint, (c) 
Inconsistent wrt the partial-match semantics in SQL:2003, because there is no 
tuple in R with a value b in its first attribute, (d) Inconsistent wrt the full-match 
semantics in SQL:2003, because there cannot be a null in an attribute that is 
referencing a different table. 

If we consider, instead of ■01, the constraint -02 : P{x,y,z) -^ R{x,y), the 
same database would be consistent only for the semantics in ^U], because the 
other semantics consider only the null value in the attributes that are relevant 
to check the constraint, and in this case there is no null value there. □ 



We want a null- value semantics that generalizes the semantics defined in SQL:2003 
1221 and is used by DBMSs, like IBM DB2 UDB. For this reason we consider 
only one kind of null value, that is interpreted in the same way for different types 
of ICs. We also want our null-value semantics to be uniform for a wide class of 
ICs, not only for the type of constraints supported commercial DBMS. 

Example 5. Consider a database with a table that stores courses with the pro- 
fessor that taught it and the term, and a table that stores the experience of 
each professor in each course with the number of times (s)he has taught the 
course. We have a foreign key constraint based on the RIC \/xyz{Course{x, y, z) 
— > 3w Exp{y, X, w)) together with the constraint expressing that table Exp 
has {ID, Code} as a key. We can be sure there are no null values in those two 
attributes. Now consider the instance D: 



Course 


Code 


ID 


Term 




CS27 
CS18 
CS50 


21 

34 

null 


W04 
null 
W05 



Exp 


ID 


Code 


Times 




21 
34 
45 


CS27 
CS18 
CS32 


3 

null 
2 



In IBM DB2, this database is accepted as consistent. The null values in columns 
Term and Times are not relevant to check the satisfaction of the constraints. In 
order to check the constraint the only attributes that we need to pay attention 
to are ID and Code. If null is in the one of these attributes in table Course, the 
tuple is considered to be consistent, without checking table Exp. For example 
Course(CS50,null, W05) has a null value in ID, therefore DB2 does not check if 
there is a tuple in Exp that satisfies the constraint. It does not even check that 
there exists a tuple in Exp with attribute Code=CS50. 

This behavior for foreign key constraints is called simple-match in the SQL 
standard, and is the one implemented in all commercial DBMS. The partial- 
and full-match would not accept the database as consistent, because partial- 
match would require Exp to have a tuple (any non-null value, 34, any value); 
and full-match would not allow a tuple with null in attributes ID or Code in 
table Course. 

If we try to insert tuple (CS41,18, null) into table Course, it would be rejected 
by DB2. This is because the attributes ID and Code are relevant to check the 
constraint and are different from null, but there is no tuple in Exp with ID =18 
and Code=CS41. D 

Example 6. Consider the single-row check constraint W ID \fName\f Salary [Emp 
{ID, Name, Salary) -^ Salary > 100) and the database D below. DB2 accepts 

this database instance as consistent. 
Here, in order to check the satisfaction 
of the constraint, we only need to verify 

that the attribute Salary is bigger than 100; therefore the only attribute that 
is relevant to check the constraint is Salary. DBMSs will accept as consistent 
any state where the condition (the consequent) evaluates to true or unknown. 
The latter is the case here. Tuple (^32, null, 50) could not be inserted because 
Salary > 100 evaluates to false. Notice that the null values in attributes other 
that Salary are not even considered in the verification of the satisfaction. □ 

When dealing with primary keys, DBMSs use a bag semantics instead of the set 
semantics, that is, a table can have two copies of the same tuple. The following 
example illustrates the issue. 



Emp 


ID 


Name 


Salary 




32 

41 


null 
Paul 


1000 
null 



Example 7. Since the SQL standard allows duplicate rows, i.e. uses the bag 
semantics, it is possible to have the database D below. If this database had P[l] 

as the primary key, then D would not 
have been accepted as a consistent 
state, i.e. the insertion of the second 
tuple P(a, b) would have been rejected. 



A 



B 



This is one of the cases in which the SQL standard deviates from the relational 
model, where duplicates of a row are not considered. In a commercial DBMS a 
primary key is checked by adding an index to the primary key and then ensuring 
that there are no duplicates. Therefore if we try to check the primary key by 
using the associated functional dependency P{x,y), P{x, z) -^ y = z vie would 
not have the same semantics since D satisfies the functional dependency in this 
classical, first-order representation. D 

With the type of first-order constraints that we are considering, we cannot en- 
force a bag semantics, therefore we will assume that D is consistent. 

In order to develop a null- value semantics that goes beyond the ICs supported 
by DBMSs, we analyze other examples. 

Example 8. Consider the UIC yxyzstuw{Person{x,y, z,w) A Person{z, s,t,u) 
— > u > w + 15), and the database D below. This constraint can be considered 

as a multi-row check constraint. If we 
want to naturally extend the seman- 
tics for single-row check constraints, D 
would be consistent iff the condition 

evaluates to true or unknown. In this case, D would be consistent because the 
condition evaluates to unknown for u — null and w — 27. Here the relevant 
attributes to check the IC are Name, Mom, Age. □ 



Person 


Name 


Dad 


Mom 


Age 




Lee 
Rod 

Mary 


Rod 

Joe 

Adam 


Mary 
Tess 
Ann 


27 

55 

null 



Example 9. Consider the UIC \/xyz{Course{x, y, z 
database D 



Employee{y, z)) and the 



Course 


Code 


Term 


ID 




CS18 


W04 


34 



Employee 


Term 


ID 




W04 


null 



Since Term, ID is not a primary key of Employee, the constraint is not a foreign 
key constraint, and therefore it is not supported by commercial DBMS. In con- 
trast to foreign key constraints, now we can have a null value in the referenced 
attributes. 

In order to extend the semantics used in commercial DBMS, to this case, 
we refer to the literature. For example, in |2^ the satisfaction of this type of 
constraints is defined as follows: An IC \/xyP{x) -^ 3zQ{y, z) is satisfied if, 
for every tuple ii S P, there exists a tuple ^2 S Q, such that ti provides less or 
equal information than ^2, i-e. for every attribute, the value in ti is the same as 
in ^2 or the value in ii is null. 

In this example we have the opposite situation: {W04,34) does not provide 
less or equal information than {W04,null). Therefore, we consider the database 
to be inconsistent wrt the constraint. Note that the only attributes that are 
relevant to check the constraint are Term and ID. D 

Examples IHI EJ |H1 and |^ show that there are some attributes that are "relevant" 
when the satisfaction of a constraint is checked against a database. 



Definition 2. For t a term, i.e. a variable or a domain constant, let pos^ip,t) 
be the set of positions in predicate R & TZ where t appears in %p. The set A of 
relevant attributes for an IC "0 of the form JQ) is 

A{ip) — {R[i] I X is variable present at least twice in -0, and i G pos^{ilj,x)} U 
{R[i] I c is a constant in ip and i G pos^{il;, c)}. □ 

Remember that i?[i] denotes a position (or the correspondent attribute) in re- 
lation R. In short, the relevant attributes for a constraint are those involved in 
joins, those appearing both in the antecedent and consequent of Q, and those 
in if. 

Definition 3. For a set of attributes A and a predicate P G 7?., we denote by 
P-^ the predicate P restricted to the attributes in A. D-^ denotes the database 
D with all its database atoms projected onto the attributes in A, i.e. D-^ = 
{P-^{nA{t)) I P{t) e D}, where i7^(f) is the projection on A of tuple i. D-^ 
has the same underlying domain lA as D. □ 



p 


A 


B 


C 




a 
b 


b 
c 


a 
a 



R 


A 


B 




a 
a 


5 
2 



Example 10. Consider a UIC : \fxyz{P{x, y, z) —> R{x, y)) and D below. 

Since x and y appear twice in ■0, A{'ip) 
= {P[1],R[1],P[2],R[2]}. The value in 
z should not be relevant to check the 

satisfaction of the constraint, because we only want to make sure that the values 
in the first two attributes in P also appear in R. Then, checking this is equivalent 
to checking ii'^xy{P-^^^\x,y) -^ R-^^^\x,y)) is satisfied by D-^^^\ For a more 
complex constraint, such as 7 : \lxyzw{P{x, y, z)AR{z, w) — *■ 3vR(x, v)\/w > 3), 
variable x is relevant to check the implication, z is needed to do the join, and w 
is needed to check the comparison, therefore .4(7) = {P[l], i?[l], P[3], i?[2]}. 



D 



jjAW . 












£)-A(7) . 












pAW 


A 


B 


YiMi') 


A 


B 


pAl) 


A 


C 


R^(7) 


A 


B 




a 
b 


b 
c 




a 
a 


5 
2 




a 
b 


a 
a 




a 
a 


5 
2 



An important observation we can make from Examples (BJ [Sj |H| and is that, 
roughly speaking, a constraint is satisfied if any of the relevant attributes has 
a null or the constraint is satisfied in the traditional way (i.e. first-order satis- 
faction and null values treated as any other constant). We introduce a special 
predicate IsNull{-), with IsNull{c) true iff c is null, instead of using the built-in 
comparison atom c — null, because in traditional DBMS this equality would be 
always evaluated as unknown (as observed in |29| , the unique names assumption 
does not apply to null values). 

Definition 4. A constraint -0 as in Q is satisfied in the database instance D, 
denoted P» hiv V' iff D^^'^'^ ^ 0^, where V^ is 

m n 

^x{\pf'^^\x,) -. ( V IsNull{v,) V 3-z{\J Qf'^\y,,-z,) V ^))), (4) 



i=l 



where x = U™ ^Xi 



vj^A{Tp)r\x 
and z = U"_ 



j=i 



3=i''r 



D^W ^ ijjN refers to classical first-order 



satisfaction where null is treated as any other constant in U. 



a 



We can see from Definition 0] that there are basically two cases for constraint 
satisfaction: (a) If there is a null in any of the relevant attributes in the an- 
tecedent, then the constraint is satisfied, (b) If no null values appear in them, 
then the second disjunct in the consequent of formula Q has to be checked, i.e, 
the consequent of the original IC restricted to the relevant attributes. This can 
be done as usual, treating nulls as any other constant. 

Formula I^J is a direct translation of formula ^ that keeps the relevant 
attributes. In particular, if the original constraint is universal, so is the trans- 
formed version. Notice that the transformed constraint is domain independent, 
and then its satisfaction can be checked by restriction to the active domain. 

As mentioned before, the semantics for IC satisfaction introduced in ^D] 
considered that tuples with null never generated any inconsistencies, even when 
the null value was not in a relevant attribute. For example, under the semantics 
in ^U], the instance {P{b, null)} would be consistent wrt the IC \/xy{P{x,y) -^ 
R{x)), but it is intuitively clear that there should be a tuple R{b). The new 
semantics corrects this, and adjusts to the semantics implemented in commercial 
DBMS. 

Notice that in a database without null values, Definition0](so as the definition 
in JO]) coincides with the traditional, first-order definition of IC satisfaction. 

Example 11. Given the ICs: (a) \/xyz{P{x,y, z) -^ R{x,y)), (b) Va;(r(a;) -^ 
3yzP(x, y, z)), the database instance D below is consistent. 



p 


A 


B 


C 




a 
b 


d 

null 


e 
g 



R 


D 


E 




a 


d 



T 



For (a), the variables x and y are relevant to check the constraint, therefore 
Ai = {P[i], ^[1], -P[2], -R[2]}; and for (b), the variable x is relevant to check the 
constraint; therefore A2 = {P[1],T[1]}. 



JjA, 



JjA2 



p-4i 


A 


B 




a 
b 


d 

null 



R-^i 


D 


E 




a 


d 



5^ 



A 



X-^2 



To check if D ^„ \/xyz{P{x,y,z) -^ R{x,y)), we need to check if D-^^ \= 
\/xy{P-^^{x,y) -^ {IsNull{x) V IsNull{y) V R-^^{x,y))) For x = a and y = d, 
Y)Ai ^ p-^^{a^ d), but none of them is a null value, i.e. IsNull{a) and IsNull{d) 
are both false, therefore we need to check if D-^^ \= R-^^{a,d). For x = b and 
y = null, D-^^ 1= P-^^{b, null), and since D-^^ \= IsNull{null), the constraint is 
satisfied. The same analysis can be done to prove that D satisfies constraint (b), 
this is by checking D^^ \= yx{T^^{x) -^ {IsNull{x) \/P^^{x))) 

If we add tuple P{f, d, null) to D, it would become inconsistent wrt constraint 
(a), because D^' ^ (P-^i (/, d) ^ {IsNull{f) V IsNull{d) V R-^^ (/, d))). D 

Example 12. Consider the IC "0: '^xywz {{Pi{x,y,w)AP2{'y, z)) ^ 3uQ{x, z,u)) 
and the database D: 



Pi 


A 


B 


C 


P2 


D 


E 


Q 


F 


G 


H 




a 


b 


c 




b 


a 




a 


a 


c 




d 


null 


c 




e 


c 




b 


null 


c 




b 


e 


null 




d 


null 




b 


c 


d 




null 


b 


b 




null 


b 




null 


c 


a 



Variables x, y and z are relevant to check the constraint, therefore the set of rele- 
vant attributes is A{iIj) = {Pi[l],Pi[2], P2[1],P2[2],Q[1], Q[2]}. Then we need to 

check if i:)-^('^) \=Wxyz i{Pf'''^\x,y) A P^^'^\y,z)) -^ {IsNull{x)W IsNull{y)V 

IsNull{z) V Q-^(*)(^,-z)), where Z)-^^''') is 



^1 


A 


B 




a 


b 




d 


null 




b 


e 




null 


b 



pAM 


D 


E 




b 

e 

d 

null 


a 

c 

null 

b 



qAW 


F 


G 




a 


a 




b 


null 




b 


c 




null 


c 



When checking the satisfaction of D^'^^^ ^ V'^j null is treated as any other 
constant. For example for x = d, y = null and z = b, the antecedent of the rule 



is satisfied since P^ {d, null) G D-^ and Pj (null, a) £ D^. If null had been 
treated as a special constant, with no unique names assumption applied to it, 
the antecedent would have been false. For these values the consequence is also 
satisfied, because IsNull{null) is true. In this example, D-^'^'^^ |= t/)^, and the 
database satisfies the constraint. D 

Notice that in order for formula Q) to have z 7^ 0, i.e. existential quantifiers, 
there must exist an atom Qj{yj,Zj) in the corresponding IC of the form ^, such 
that Zj has a repeated variable. This is because that is the only case in which a 
constraint can have {A{ip) \ x) ^0. 

Example 13. Giye'mp:\/x{P{x, y) —>■ 3zQ{x,z,z)) and D = {P{a,b), P[null,c), 
Q{a, null, null)}, A{i^) = {P[l], Q[l], Q[2], Q[3]}. D satisfies i; iS D^ [= i^^ , 
with L>'^W = {P'^{a),P^(null),Q^{a, null, null)} and ip^ : Va;(P'^(^)(a;) ^ 
{IsNull{x) V 3zQ-^^^\x , z , z))) . The constraint is satisfied, because for .t = a it 
is satisfied given that there exists the satisfying value null for z; and for x = null 
the constraint is satisfied given that IsNull{null) is true. □ 

The predicate IsNull also allows us to specify NOT A^fiL-constraints, which 
are common in commercial DBMS, and prevent certain attributes from taking a 
null value. As discussed before, this constraint is different from having x ^ null. 

Definition 5. A NOT NULL-consiraJmi (NNC) is a denial constraint of the 

^°™ \/x{P{x) A IsNull{xi) -^ false), (5) 

where Xi £ x \s in the position of the attribute that cannot take null values. For 
a NNC 4'^ we define D ^„ if] \E D \= %Ij \n the classical sense, treating null as 
any other constant. □ 

Notice that a NNC is not of the form ^, because it contains the constant null. 
This is why we give a separate definitions for them. By adding NNCs we are able 
to represent all the constraints of commercial DBMS, i.e. primary keys, foreign 
key constraints, check constraints and NOT A^f/LL-constraints. 

Our semantics is a natural extension of the semantics used in commercial 
DBMSs. Note that: (a) In a DBMS there will never be a join between a null 
and another value (null or not), (b) Any check constraint with comparison, 
e.g <,>,=, will never create an inconsistency when comparing a null value with 
any other value. These two features justify our decision in Definition^to include 
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the attributes in the joins and the elements in tp among the attributes that are 
checked to be null with IsNull, because if there is a null in them an inconsistency 
will never arise. 

Our semantics of IC satisfaction with null values allows us to integrate our 
results in a compatible way with current commercial implementations; in the 
sense that the database repairs we will introduce later on would be accepted as 
consistent by current commercial implementations (for the classes of constraints 
that can be defined and maintained by them). 

4 Repairs of Incomplete Databases 

Given a database instance D, possibly with null values, that is inconsistent, i.e. 
D does not satisfy a given set IC of ICs of the kind defined in Section |3| or 
NNCs. A repair of D will be a new instance with the same schema as D that 
satisfies IC and minimally differs from D. 

More formally, for database instances D, D' over the same schema, the dis- 
tance between them was defined in j2j by means of the symmetric difference 
A{D,D') = (D \ D') U (£>' \ D). Correspondingly, a repair of D wrt IC was 
defined as an instance D' that satisfies IC and minimizes A{D, D') under set 
inclusion. Finally, a tuple t was defined as a consistent answer to a query Q{x) 
in D wrt IC if i is an answer to Q{x) from every repair of D wrt IC. The defini- 
tion of repair given in |2] implicitly ignored the possible presence of null values. 
Similarly, in [Sj |S1 E] j that followed the repair semantics in '? , no null values 
were used in repairs. 

Example 14- Consider the database D below and the RIC: Course{ID, Code) -^ 

3Name Student{ID, Name). D is in- 
consistent, because there is no tuple in 
Student for tuple Course(34,C18) in 



Course 



ID 



Code 



CI 5 
CI 8 



Student 



ID 



Name 



Ann 
Paul 



Course. The database can be minimally repaired by deleting the inconsistent 
tuple or by inserting a new tuple into table Student. In the latter case, since the 
value for attribute Name is unknown, we should consider repairs with all the 
possible values in the domain. Therefore, for the repair semantics introduced in 
|2| , the repairs are of the two following forms 



Course 


ID 


Code 


Student 


ID 


Name 


Course 


ID 


Code 


Student 


ID 


Name 




21 


CI 5 




21 
45 


Ann 
Paul 




21 
34 


C15 
CIS 




21 
45 
34 


Ann 
Paul 


for all the 
of repairs 


po 


3sible 


values of /i 


m t 


he don 


lain, obtain 


mg 


apos 


sibly infinit 


en 


imber 

D 



The problem of deciding if a tuple is a consistent answer to a query wrt to a set 
of universal and referential ICs is undecidable for this repair semantics ll . 

An alternative approach is to consider that, in a way, the value fi in Example 
1141 is an unknown value, and therefore, instead of making it take all the values in 
the domain, we could use it as a null value. We will pursue this idea, which re- 
quires to modify the notion of repair accordingly. It will turn out that consistent 
query answering will become decidable for universal and referential constraints. 

Example 15. f example 1141 cont.) By using null values, there will be only two 
repairs: 
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Repair 1: 
Course 



ID 



21 



Code 



CI 5 



Student 



ID 



Name 



Ann 
Paul 



Repair 2 






Course 


ID 


Code 




21 
34 


CI 5 
CI 8 



Student 



ID 



Name 



Ann 
Paul 
null 



Here null tells us that there is a tuple with 34 in the first attribute, but unknown 
value in the second. □ 



Now we define in precise terms the notion of repair of a database with null values. 

Definition 6. 6 Let D, D', D" be database instances over the same schema and 
domain W. It holds D' <d D" iff: (a) For every database atom P{a) e A{D, D'), 
with a e {U \ {null}),'^ it holds P{a) G A{D,D"); and (b) For every atom 
Q{a,'mirif e A{D,D'), with_a <E {U \ {null}), 
Qia, h) e A{D, D") and Qia, b) ^ A{D, D'). 



there exists a b ^ U such that 

D 



Definition 7. Given a database instance D and a set IC of ICs of the form ^ 
and NNCs, a repair of D wrt IC is a database instance D' over the same schema, 
such that D' |=„ IC and D' is <o-minimal in the class of database instances 
that satisfy IC wrt ^„, and share the schema with D, i.e. there is no database 
D" in this class with D" <d D' , where D" <d D' means D" <d D' but not 
D' <D D". The set of repairs of D wrt IC is denoted with Rep{D, IC). □ 

In the absence of null, this definition of repair coincides with the one in [2]. 

Example 16. The database instance D = {Q{a,b),P{a,c)} is inconsistent wrt 
the ICs -01 : {P{x,y) — > 3zQ{x,z)) and ip2 ■ {Q{x,y) -^ y ^ b).^ because 
D ^jy ip2- The database has two repairs wrt {ipi,ip2}, namely Di — {}, with 
A{D, Di) = {Q{a, b), P(a, c)}, and Da = {P{a, b), Q{a, null))}, with A{D, D2) = 
{Q{a, b), Q{a, null)}. Notice that D2 ■^d Di because Q{a, null) e A{D, D2) and 
there is no constant d &U such that Q{a, d) G A{D, Di) and Q{a, d) ^ A{D, D2). 
Similarly, Di ^d D2, because P{a, c) G A{D, Di) and P{a, c) ^ A{D, Di). U 

Example 11. If the database instance is {P{a, null),P{b, c), R{a, b)} and IC con- 
sists only of {P{x, y) —>■ 3z R{x, z)), then there are two repairs: Di = {P{a, null), 
P{b, c), R{a, b), R{b, null)}, with A(D, Di) = {R{b, null)}, and D2 = {P(a, null), 
Rla, b)}, with A{D, D2) = {P{b, c)}. Notice, for example, that Da = {P{a, null), 
P{b, c), R{a, b), R(b, d)}, for any d ^ U different from null, is not a repair: Since 
A{D, D3) — {R{b, d)}, we have D2 <d ^3 and, therefore D3 is not <£)-minimal. 
D 

Example 18. Consider the UIC \/xy{P{x,y) -> T{x)) and the RIC Va;(T(a;) -^ 
3yP{y, x)), and the inconsistent database D — {P{a, b),P{null, a), T{c)}. In this 
case, we have a RIC-cyclic set of ICs. The four repairs are 
* That a £ (W \ {null}) means that each of the elements in tuple a belongs to (W \ 

{null}). 
^ null is a tuple of null values, that, to simplify the presentation, are placed in the last 

attributes of Q, but could be anywhere else in Q. 
® The second IC is non-generic Q iu the sense that it implies some ground database 

literals. Non generic ICs have in general been left aside in the literature on CQA. 
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{P{a, h), P{null, a),T{c),P{null, c), T{a)} 

{P{a,b), P(null,a),T{a)] 

{P{null,a),T{c),P{null,c)} 

{P{null,a)} 



A{D,Di) 



{T{a),P{null,c)} 

{T(a),r(c)} 

{P{a,b),P{null,c)} 

{P(a,6),r(c)} 



Notice that, for example, the additional instance D5 — {P{a, b), P{null, a), T{c), 
P{c, a), r(c)}, with A{D, D5) = {T{a), P{c, a)}, satisfies IC, but is not a repair 
because Di <jj D5. D 

The previous example shows that we obtain a finite number of repairs (with 
finite extension). If we repaired the database by using the non-null constants 
in the infinite domain with the repair semantics of |2|, we would obtain an 
infinite number of repairs and infinitely many of them with infinite extension, 
as considered in [111 . 

Example 19. Consider a schema with relations R{X, K), with primary key -R[l], 
and a table S{U, V), with S'[2] a foreign key to table R. The ICs are \/xyz {R{x, y) 
AR{x, z) -*y^ z) and Vuw {S{u, v) -^ 3y R{v, y)), plus the NNC Wxy{R{x, y) A 
IsNuU(x) -^ false). Since the original database satisfies the NNC and there is no 
constraint with an existential quantifier over R[l], the NNC will not be violated 
while trying to solve other inconsistencies. We would have a non- conflicting in- 
teraction of RICs and NNCs. Here D = {R{a,b),R{a,c),S{e, f), S{null,a)} is 
inconsistent and its repairs are Di — {R{a,b), S{e, f), S{null,a), R{f,null)}, 
D2 = {R{a,c), S{e, f),S{null,a), R(f,null)}, D3 — {R^a^b), S{null,a)} and 
Di = {R{a,c),S{null,a)} D 

If a given database D is consistent wrt a set of ICs, then there is only one repair, 
that coincides with D. The following example shows what can happen if we 
have a conflicting interaction of a RIC containing an existential quantifier over 
a variable with an additional NNC that prevents that variable from taking null 
values. 

Example 20. Consider the database D = {P{a),P{b), Q{b, c)}, the RIC Va; {P{x) 
— > 3y Q{x,y)), and the NNC \lxy{Q{x^y) A IsNull{y) — > false) over an exis- 
tentially quantified attribute in the RIC. We cannot repair as expected using 
null values. Actually, the repairs are {P(6), Q(5, c)}, corresponding to a tuple 
deletion, but also those of the form {P(a), P(6), (5(6, c), (5(a, fx)}, for every 
IJ, G {U \ {null}), that are obtained by tuple insertions. We thus recover the 
repair semantics of 0. n 

With an appropriate confiicting interaction of RICs and NNCs we could recover 
in our setting the situation where infinitely many repairs and infinitely many 
with finite extension appear (c.f. remark after ExamDle ll8|l . Our repair semantics 
above could be modified in order to repair only through tuple deletions in this 
case, when null values cannot be used due to the presence of confiicting NNCs. 
This could be done as follows: If Rep{D, IC) is the class of repairs according to 
Definitions El and the alternative class of repairs, Rep^(D, IC), that prefers 
tuple deletions over insertions with arbitrary non-null elements of the domain 
due to the presence of confiicting NNCs, can be defined by Rep^{D,IC) := 
{D' I D' e Rep{D,IC) and there is no D" e Rep{D,IC') with D" <d D'}, 
where IC' is IC without the (conflicting) NNCs. 
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Since the semantics introduced Definitions |^ and is easier to deal with, and 
in order to avoid repairs like those in Example 1201 we will make the following 

Assumption: Om' sets IC, consisting of ICs of the form (^ and NNCs, are non- 
conflicting^ in the sense that there is no NNC on an attribute that is existentially 
quantified in an IC of the form (Q . 

In this way, we will always be able to repair RICs by tuple deletions or tuple 
insertions with null values. Notice that every set of ICs consisting of primary key 
constraints (with the keys set to be non-null), foreign key constraints, and check 
constraints satisfies this condition. Also note that if there are non conflicting 
NNCs, the original semantics and the one based on i?ep^-repairs coincide. The 
repair programs introduced in Section compute specify the Rep^-repairs, so 
our assumption is also relevant from the computational point of view. 

Notice that with our repair semantics, we can prove that there will always 
exists a repair for a database D and a set of non-conflicting constraints ICs; 
and that the set of repairs is finite and each of them is finite in extension (i.e. 
each database relation is finite), because a database instance with no tuples 
always satisfies the constraints, and the domain of the repairs can be restricted 
to adom{D) U const(IC) U {null}, where adom(D) is the active domain of the 
original instance D and const (IC) is the set of constants that appear in the 
constraints. 

Proposition 1. Given a database D and a set IC of non-confiicting ICs: (a) 
For every repair D' ^ Rep{D, IC), adom{D') C adom{D)Uconst{IC)U{null}. 
(b) The set Rep{D,IC) of repairs is non-empty and finite; and every D' G 
Rep{D, IC) is finite.'^ □ 

Theorem 1. The problem of determining if a database D' is a repair of D wrt 
a set IC consisting of ICs of the form ^ and NNCs^ is coAf-complete. □ 

Definition 8. |2] Given a database D, a set of ICs IC, and a query Q{x), a 
ground tuple t is a consistent answer to Q wrt IC in D iff for every D' G 
Rep{D, IC), D' \= Q[t\. If Q is a sentence (boolean query), then yes is a con- 
sistent answer iff _C |= Q for every D' G Rep{D, IC). Otherwise, the consistent 
answer is no. □ 

In this formulation of CQA we are using a notion D' \= Q[t\ of satisfaction of 
queries in a database with null values. At this stage, we are not committing to 
any particular semantics for query answering in this kind of databases. In the 
rest of the paper, we will assume that we have such a notion, say \=%, that can 
be applied to queries in databases with null values. Some proposals can be found 
in the literature [2111211 ESI- In principle, \='j^ may be orthogonal to the notion 
\=N for satisfaction of ICs. However, in the extended version of this paper we will 
present a semantics for query answering that is compatible with the one for IC 
satisfaction. For the moment we are going to assume that \='j^ can be computed 
in polynomial time in data for safe first-order queries, and that it coincides 
with the classical first-order semantics for queries and databases without null 
values. We will also assume in the following that queries are safe |22|, a sufficient 
syntactic condition for domain independence. 

For proofs of all results go to www.scs.carleton.ca/~lbravo/IIDBdemos.pdf 
* In this case we do not need the assumption of non-conflicting ICs 
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The decision problem of consistent query answering is 

CQA{Q,IC) = {{D,t) I i is a consistent answer to Q{x) wrt IC in D}. 

Since we have Q and IC as parameters of the problem, we are interested in the 
data complexity of this problem, i.e. in terms of the size of the database Q]. It 
turns out that CQA for FOL queries is decidable, in contrast to what happens 
with the classic repair semantics |2J, as established in |11| . 

Theorem 2. Consistent query answering for first-order queries wrt to non- 
conflicting sets of ICs of the form Q and NNCs is decidable. □ 

The ideas behind the proof are as follows: (a) There is a finite number of database 
instances that are candidates to be repair given that the use only the active 
domain of the original instance, null and the constants in the ICs. (b) The 
satisfaction of ICs in the candidates can de decided by restriction to the active 
domain given that the ICs are domain independent, (c) Checking if Di <z) D2 
can be effectively decided, (d) The answers to safe first-order queries can be 
effectively computed. 

The following proposition can be obtained by using a similar result 15 and 
the fact that our tuple deletion based repairs are exactly those considered in 
| 15| , and every repair in our sense that is not one of those contains at least one 
tuple insertion. 

Theorem 3. Consistent query answering for first-order queries and non-conflict- 
ing sets of ICs of the form (Q or NNCs is ilj-complete. □ 

In the proof of this theorem NNCs are not needed for hardness. Actually, hard- 
ness can be obtained with boolean queries. 

5 Repair Logic Programs 

The stable models semantics was introduced in ^ISj to give a semantics to dis- 
junctive logic programs that are non-stratified, i.e. that contain recursive defini- 
tions that contain weak negation. By now it is the standard semantics for such 
programs. Under this semantics, a program may have several stable models; and 
what is true of the program is what is true in all its stable models (a cautious 
semantics) . 

Repairs of relational databases can be specified as stable models of disjunctive 
logic programs. In [Sj^lE] such programs were presented, but they were based 
on classic IC satisfaction, that differs from the one introduced in Section 13 

The repair programs we will present now implement the repair semantics 
introduced in Section|31for a set of RIC-acyclic constraints. The repair programs 
use annotation constants with the intended, informal semantics shown in the 
table below. The annotations are used in an extra attribute introduced in each 
database predicate; so for a predicate P Cz TZ, the new version of it, E. , contains 
an extra attribute. 



Annot 


ation 


Atom 


ta 




R{a,t^} 


fa 




H(a,fa) 


t^ 




R{a,t*) 


t^* 




H(a,t**) 



The tuple P{a) is. 



advised to be made true 
advised to be made false 
true or becomes true 
it is true in the repair 



In the repair program, null is treated as any other constant in U, and therefore 
the IsNull{x) atom can be replaced by a; = null. 
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Definition 9. Given a database instance D, a set IC of UICs, RICs and NNCs, 
the repair program n{D, IC) contains the foUowing rules: 

1. Facts: P(a) for each atom P{a) e D. 

2. For every UIC ■0 of form 0, the rules: 

Vr.i H, (s, , f.) V vr=i Q-j {yj , ta) ^ Ar.i ^- (*, , t*) , Aa, ^q' Q-j (vj • f-) . 

for every set Q' and Q" of atoms appearing in formula ij^j such that Q'iJQ" ~ 
Ujli QjiVj) ^'^d Q' n Q" — 0.^ Here -4(^) is the set of relevant attributes 
for ■0,5 = Uj^i Xi and i^ is a conjunction of built-ins that is equivalent to 
the negation of ip. 

3. For every RIG of form Q, the rules: 

R.{x,fcL) V Q.{x' ,null,tn) ^- R{x,t*), not aux{x'),x' ^ null. 

And for every yi G y: 

aux{x') <— Q.{x' ,y,t*), not Q.{x' ,y,ia), x' ^ null, yi j^ null. 

4. For every NNG of the form ©, the rule: 
R{x,{cl) ^- R.{x,t*),Xi — null. 

5. For each predicate P £ R, the annotation rules: 

R.{X, t*) ^ P(X). P.{X, t*) ^ P.(X, ta). 

6. For every predicate P € TZ, the interpretation rule: 
P.{x,t**) ^ E.{x,t*), not P.{x,i^). 

7. For every predicate P € TZ, the program denial constraint: 

^ P-{x,U), H(a;,fa). □ 

Facts in 1. are the elements of the database. Rules 2., 3. and 4. capture, in the 
right-hand side, the violation of IGs of the forms |(2J), ©, and Q, resp., and, 
with the left-hand side, the intended way of restoring consistency. The set of 
predicates Q' and Q" are used to check that in all the possible combinations, 
the consequent of a UIG is not being satisfied. Since the satisfaction of UIGs and 
RIGs needs to be checked only if none of the relevant attributes of the antecedent 
are null, we use x ^ null in rule 2. and in the first two rules in 3. (as usual, 
x' 7^ null means the conjunction of the atoms Xj ^ null for Xj £ a;'). Notice that 
rules 3. are implicitly based on the fact that the relevant attributes for a RIG 
of the form © are A= {x | x € x'}. Rules 5. capture the atoms that are part 
of the inconsistent database or that become true in the repair process; and rules 
6. those that become true in the repairs. Rule 7. enforces, by discarding models, 
that no atom can be made both true and false in a repair. 

Example 21. f example 1191 cont . ) The repair program n{D, IC) is the following: 

1. Ria,b). R{a,c). S{e,f). S{null,a). 

2. R{x,y,fa.)\/ R{x,z,fsL) <— R{x,y,t*),R{x,z,t*), y ^ z, X ^ null. 

3. S.{u,x, fa) V R{x, null, ta.) <— S.{u,x,t*), not aux{x), x ^ null. 
aux{x) ^~ R(x,y,t*), not R{x,y,ia), x 7^ null, y 7^ null. 

5. R{x,y,t*) '^ R(x,y,ta). R{x,y,t*) ^ R{x,y,ta). (similarly for 5) 

6. R{x, y, t**) +- R(x, y, ta). 

R{x,y,t**) ^ R{x,y), not R{x,y,fa). (similarly for 5) 

7. ^ R{x,y,ta),R{x,y,iB). ^ S(S, ta), S(x, fa). 

Only rules 2. and 3. depend on the IGs: rules 2. for the UIG, and 3. for the RIG. 
They say how to repair the inconsistencies. In rule 2., Q' — Q" = 0, because 

^ We are assuming in this definition that the rules are a direct translation of the 
original ICs introduced in Section |5] in particular, the same variables are used and 
the standardization conditions about their occurrences are respected in the program. 
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there is no database predicate in the consequent of the UIC. There is no rule 4., 
because there is no NNC. □ 

Example 22. Consider D = {P{a,b), P{c,null)} and the non-conflicting set of 
ICs: {VP(a;, y) -^ R{x) V S{y),P{x, y) A IsNull{y) ^ false}. Then 7T(D, IC) : 

1. P{a,h). P{c,null). 

2. ii(x,jy,fa)V71(x,ta)VS(j/,ta) ^ P-{x, y ,t*) , R{x,t^) , S.{y,Q, x ^ null, y ^ null. 
_R(a;,j/,fa) Vii(a;,ta) VS(y,ta) ^ ii(a;, j/, t*), _R(a;,fa), not S{y), x / null,y / null. 
7i(a::,2/,fa) Vii(a;,ta) VS(y,ta) ^ ii(x,j/,t*), not R{y), S(x,fa), a; / null,y / nuZL 
/i(x,2/,fa)Vii(a:, ta)Vfi(j/, ta) *— R.{x,y,t*), not R{y), not S{y), x / null, y / null. 

4. P.{x,y,f^) <— E.{x,y,t*),y ^ null. 

5. -R.(a:,y, t*) ^ ii(a;,y,ta). P{x,y,t*) ^~ P{x,y). (similarly for _R. and 5) 

6. P.(a;,j/,t**)^P.(x,j/,ta). 

R{x,y,t**) *— P{x,y), not P(x,j/,fa). (similarly for R and 5) 

7. ^ P(a;,y,ta),P(a;,2/,fa). (similarly for i? and S) 
The rules in 2. are constructed by choosing all the possible sets Q' and Q" such 
that Q'UQ" = {R{x), S{y)} and Q'nQ" = 0. The first rule in 2. corresponds to 
Q' = {R{x),Siy)} and Q" = 0, the second for Q' = {R{x)} and Q" ^ {S{y)}, 
the third for Q' = {S{y)} and Q" = {i?(x)}, and the fourth for Q' = and 
Q" = {P(a:),5(2/)} a 

The repair program can be run by a logic programming system that computes 
the stable models semantics, e.g. DLV system |21]. The repairs can be obtained 
by collecting the atoms annotated with t** in the stable models of the program. 

Definition 10. Let A^ be a stable model of program n{D, IC). The database 
instance associated with M is Dm = {-P(a) | P G ^ and P(a,t**) e M}. □ 

Example 23. (example 1211 continued) The program has four stable models (the 

facts of the program are omitted for simplicity) : 

M-i — {R-{a,b,t*), P_(a,c,t*), S_(e,/,t*), S-{null,a,t*), aux{a), S-{e,f,t**), 

S-{null,a,t**), R_( f, null, ta.), P_(a,fe, t**), P_(a,c, fa), R-{f,null,t*), 

R.if,null,t**) }, 
M2 — {R-{a,b,t*), P_(a,c,t*), S_(e,/,t*), S-{null,a,t*), aux{a), 5_(e,/,t**), 

S_{null,a,t**), R_{f , null, ta.), P_(a,6, fa), P_(a, c, t**), R_{f,null,t*), 

R.{f,null,t**) }, 

Ms — {R-{a,b,t*), P_(a, c, t*), 5'_(e,/, t*), S-{null,a,t*), aux{a), S_(e,/, fa), 
S-{null,a,t**), P_(a, b, t**), P_(a,c, fa)}, 

M4, = {R-ia,b,t*), P_(a,c,t*), 5'_(e,/,t*), S-{null,a,t*), aux{a), S_(e,/,fa), 
S-{null,a,t**), P_(a,fe, fa), P_(a,c, t**)}. 

The databases associated to the models select the underlined atoms: Di = 
{S{e,f), S{null,a), Ria,b), R{f,null)}, D2 = {S{e,f), S{null,a), R{a,c), R{f, 
null)} D3 = {S{null,a), R(a,b)} and £)4 = {S{null,a), R{a,c)}. As expected 
these are the repairs obtained in Example ^1 □ 



Theorem 4. Let IC be a RIC-acyhc set of UICs, RICs and NNCs. If M is 
a stable model of II(D,IC), then Dj^ is a repair of D with respect to IC. 
Furthermore, the repairs obtained in this way are all the repairs of D. □ 
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6 Head-Cycle-Pree Programs 

In some cases, the repair programs introduced in Section |31 can be transformed 
into equivalent non-disjunctive programs. This is the case when they become 
head- cycle- free |Sj. Query evaluation from such programs has lower computa- 
tional complexity than general disjunctive programs, actually the data complex- 
ity is reduced from TTj -complete to coA'P-complete |S1 E]. We briefly recall 
their definition. 

The dependency graph of a ground disjunctive program U is the directed 
graph that has ground atoms as vertices, and an edge from atom A to atom B 
iff there is a rule with A (positive) in the body and B (positive) in the head. 71 
is head-cycle free (HCF) iff its dependency graph does not contain any directed 
cycles passing through two atoms in the head of the same rule. A disjunctive 
program U is HCF if its ground version is HCF. 

A HCF program 77 can be transformed into a non-disjunctive normal pro- 
gram sh{n) that has the same stable models. It is obtained by replacing every 
disjunctive rule of the form \l^^iPi{xi) ^- t\jLiQj{yj), V- by the n rules 
Pt{xi) ^ A'^^iQjiVj), f, /\k^i not Pk{xk)., for i= l,...,n. 

For certain classes of queries and ICs, consistent query answering has a data 
complexity lower than IJ2 , a sharp lower bound as seen in Theorem |31(c.f. also 
[15 ). In those cases, it is natural to consider this kind of transformations of the 
disjunctive repair program. In the rest of this section we will consider sets IC of 
integrity constraints formed by UICs, RICs and NNCs. 

Definition 11. A predicate P is bilateral with respect to IC if it belongs to 
the antecedent of a constraint ici G IC and to the consequent of a constraint 
ic2 £ IC, where ici and ic2 are not necessarily different. □ 

Example 21 If IC = {Va; {T{x) -^3y R{x,y),yxy iS{x,y) -^ T{x))}, the only 
bilateral predicate is T. D 

Theorem 5. For a set IC of UICs, RICs and NNCs, if for every ic G IC, it holds 
that (a) ic has no bilateral predicates; or (b) ic has exactly one occurrence of a 
bilateral predicate (without repetitions), then the program II{D, IC) is HCF. D 

For example, if in IC we have the constraint P{x,y) -^ P{y,x), then P is a 
bilateral predicate, and the condition in the theorem is not satisfied. Actually, the 
program II{D, IC) is not HCF. If we have instead P(x, a) — > P{x, h), even though 
the condition is not satisfied, the program is HCF. Therefore, the condition is 
sufficient, but not necessary for the program to be HCF. 

This theorem can be immediately applied to useful classes of ICs, like denial 
constraints, because they do not have any bilateral literals, and in consequence, 
the repair program is HCF. 

Corollary 1. If IC contains only constraints of the form y{/\^^iPi(ti) -^ ip), 
where Pi{ti) is a database atom and (p is a formula containing built-in predicates 
only, then 77(7?, IC) is HCF. D 

As a consequence of this corollary we obtain, for first-order queries and this 
class of ICs, that CQA belongs to coNP, because a query program (that is non- 
disjunctive) together with the repair program is still HCF. For this class of con- 
straints, with the classical tuple-deletion based semantics, this problem becomes 
CO A^F-complete ^3]. Actually, CQA for this class with our tuple-deletion/null- 
value based semantics is still co A^P-complete, because the same reduction found 
in JS] can be used in our case. 
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7 Conclusions 

We have introduced a new repair semantics that considers, systematicahy and 
for the first time, the possible occurrence of null values in a database in the 
form we find them present and treated in current commercial implementations. 
Null values of the same kind are also used to restore the consistency of the 
database. The new semantics applies to a wide class of ICs, including cyclic sets 
of referential ICs. 

We established the decidability of CQA under this semantics, and a tight 
lower and upper bound was presented. The repairs under this semantics can be 
specified as stable models of a disjunctive logic program with a stable model 
semantics for acyclic foreign key constraints, universal ICs and NOT NULL- 
constraints, covering all the usual ICs found in database practice. 

In an extended version of this paper we will provide: (a) An extension of 
our semantics of IC satisfaction in databases with null values that can also be 
applied to query answering in the same kind of databases, (b) A more detailed 
analysis of the way null-values are propagated in a controlled manner, in such 
a way that no infinite loops are created, (c) Construction of repairs based on a 
sequence of "local" repairs for the individual ICs. 
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